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ABSTRACT 



This report addresses challenges associated with the 
reporting of test scores of students with disabilities participating in 
alternate assessments. It summarizes six models currently under construction 
or already being used by states. Using proficiency levels as a common 
reporting approach, the six models are: (1) same proficiency levels for 

general assessment and alternate assessment; (2) different proficiency levels 
for general and alternate assessments but treated as the same; (3) different 
proficiency levels for general assessment and alternate assessment; (4) 
overlapped proficiency levels for general assessment and alternate 
assessment; (5) lowest possible proficiency level for alternate assessment; 
and (6) no alternate assessment proficiency levels. The pros and cons of each 
of the six models are addressed, along with the implications of using each 
model. The importance of collecting data on these models as they are 
implemented is stressed. (DB) 
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Executive Summary 



Reporting the scores of students with disabilities participating in alternate assessments raises a 
number of challenges, including those surrounding concerns about statistical soundness, as 
well as those related to the different purposes and focuses that characterize current alternate 
assessments. Across the nation, states have reached different decisions about how to report the 
results of their alternate assessments. This report summarizes six models currently under con- 
struction, or in some cases, already being used by states. Using proficiency levels as a common 
reporting approach, the six models are: 

Model 1 : Same proficiency levels for general assessment and alternate assessment 

Model 2: Different proficiency levels for general and alternate assessments are treated 
as the same 

Model 3: Different proficiency levels for general assessment and alternate assessment 

Model 4: Overlapped proficiency levels for general assessment and alternate assessment 

Model 5: Lowest possible proficiency level for alternate assessment 

Model 6: No alternate assessment proficiency levels 

The pros and cons of each of the six models are addressed, along with the implications of using 
each model. It will be important to monitor the impact of the different approaches over time. 
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In response to the 1997 reauthorization of the Individuals with Disabilities Education Act (IDEA 
97) and Title I of the Elementary and Secondary Education Act (ESEA), states are now conducting 
alternate assessments for students with disabilities who cannot participate in general state 
assessments, even with accommodations or modifications. The work thus far has involved 
development of many different assessment strategies nationally, including checklists, reviews 
of records, surveys, performance events, documentation of progress on EEPs, and the collection 
of various types of evidence into paper or electronic portfolios (Thompson & Thurlow, 2001). 
Since students with disabilities participate in general assessments with and without 
accommodations, the alternate assessment population represents only a segment of students 
with disabilities and generally a very small segment of the total student population. Generally, 
states have identified up to 2.5% of the total student population or about 20% of students with 
disabilities as appropriate for their alternate assessments. 

States are at the point of deciding how their alternate assessments will be scored and reported. 
Regardless of the manner in which the assessments were conducted, or the extent to which 
reliability and validity of scores have been established, the results are to be reported publicly. 
IDEA and Title I requirements are not prescriptive about how results are to be reported. IDEA 
97 (Section 300. 139) requires states to publicly report on alternate assessment participation and 
performance (see Table 1). Title I (Section 1 1 1 1) requires that states disaggregate the results for 
students with disabilities compared to nondisabled students, and to provide for the reporting of 
results to be included in a public report on school progress. According to Summary Guidance 
on the Inclusion Requirement for Title I Final Assessments (Cohen, 2000), “Whatever assessment 
approach is taken [referring to standard assessment, assessment with accommodations, or 
alternate assessment], the scores of students with disabilities must be included in the assessment 
system for purposes of public reporting and school and district accountability” (p. 2). 

As states are determining how the results of alternate assessments will be reported, the question 
arises as to how the results will be presented in relation to the reports of their general assessments. 
This paper presents six models that are currently in use or being considered to situate the alternate 
assessment results within states' reporting systems. 



Issues that Have an Impact on Reporting Decisions 

Several factors potentially could have an impact on decisions about reporting alternate assessment 
results. The three addressed here are among the more salient within the context of standards- 
based reform. 
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Table 1. IDEA 97 Language about Reporting Assessment Participation and Performance 



(a) . . .the SEA shall make available to the public, and report to the public with the same frequency 
and in the same detail as it reports on the assessment of nondisabled children, the following 
information: 

(1 ) The number of children with disabilities participating - 

(i) In regular assessments; and 

(ii) In alternate assessments. 

(2) The performance results of the children described in paragraph (a)(1 ) of this section if 
doing so would be statistically sound and would not result in the disclosure of 
performance results 

identifiable to individual children - 

(i) On regular assessments (beginning not later than July 1, 1998); and 

(ii) On alternate assessments (not later than July 1 , 2000). 

(b) . . . Reports to the public under paragraph (a) of this section must include - 

(1 ) Aggregated data that include the performance of children with disabilities together with 
all other children; and 

(2) Disaggregated data on the performance of children with disabilities. 

[Authority: 20 U.S.C. 61(a)(17)(B)] 



Statistical Soundness 

There is quite a bit of controversy over the concept of “statistically sound.” It is a discussion 
that relates to the soundness of the scores on the assessment (reliability and validity), the 
aggregation of the scores from alternate assessments with general assessments, and the 
aggregation of scores from general assessments administered with standard and non-standard 
administrations (see Thurlow & Wiener, 2000). 

It is important to continue to address these issues. My purpose here is not to explore the technical 
issues involved in the aggregation of scores from alternate assessments with scores from general 
assessment, but rather to identify different ways in which it could be done. Throughout this 
discussion, however, it is important to recognize that the technical issues have a significant 
impact on the discussion of how scores are reported. Still, even when scores are determined not 
to be “statistically sound” or when it has been determined that they will not be aggregated with 
other scores for reporting, the federal mandates suggest that they be visible. 



Purpose and Focus 

There are numerous variables that have an impact on a state's decision about how scores will be 
reported. One is the purpose of the assessment. Different types of reports may be used if the 
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assessment will be used for instructional programming rather than for accountability purposes, 
or to compare schools. 

Several types of reports are under discussion in many states. The following four types exemplify 
some of the options states are considering. One type of report includes all students on all 
assessments (100% of the total student population). Another includes all students on the general 
assessment with or without accommodations, and in some states, with non-standard 
accommodations (approximately 98% of the total student population). A third type shows all 
students with disabilities (approximately 1 0% of the total student population) on all assessments. 
Last, the report may display the results of students with disabilities on the alternate assessment, 
sometimes including students in the general assessment with non-standard accommodations, or 
taking off-level or out-of-level tests. There is almost as much variability in the reports as there 
are states. 

Embedded in the purpose of assessment are determinations of what is assessed - the focus of 
the assessment. Some states, such as Kentucky, South Carolina, Tennessee and Rhode Island, 
developed rubrics that focus not only on student achievement but also directly evaluate programs 
(Thompson & Thurlow, 2001). This is in contrast to states, such as Massachusetts and Colorado, 
which have determined that student achievement will be the only indicator of program quality 
or improvement. This focus of the assessment also has an impact on how scores are reported. 

Stakes 

The consequences of the assessment bring other considerations for reporting. A state that uses 
the assessment to determine graduation or grade promotion will likely have different reporting 
requirements than states with school or district- level consequences. At this time, two states 
(Massachusetts and Ohio) are considering the use of their alternate assessments as a way for 
students to earn a state diploma. Other states see the alternate assessment as a path to a different 
certificate. In some states, the report format reflects the decision that the skills required for 
proficiency on the alternate assessment are at a lower level than the skills required for proficiency 
on the general assessment. 



This is the first year that most states will produce reports on alternate assessments. Many 
approaches to reporting the results of alternate assessments are emerging as states consider the 
purposes of their assessments, the statistics involved, the requirements of their accountability 
systems, the federal requirements, and the stakes attached. In viewing the various approaches, 
there seem to be six models of reporting currently under construction. While these models 
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probably are not exhaustive, I present them here to illustrate simply and graphically some of the 
options that exist at this time. 

All states have at least three levels of proficiency. However, most states report their general 
assessment results using four levels, some use more levels. For my purpose here, the four levels 
are used to demonstrate the relationship of the alternate assessment to the general assessment. 
The pros, cons, and implications of each model are presented also. 



Model 1 

In Model 1 (shown in Figure 1), the scores of students in the alternate assessment are placed 
into one of four levels of proficiency, just as the general assessment are. When reported, the 
alternate assessment scores are aggregated with the general assessment scores in the appropriate 
corresponding proficiency category. The scores of the alternate assessment carry the same weight 
in the reporting (and perhaps in the accountability system) as do the scores of the general 
assessment. 

Pros. There are several pro-Model 1 statements. Among them are the following reasons why an 
approach that places all students in the same proficiency levels might be positive: 

• The scores of alternate assessments are valued as equal to the scores of general assessments. 



Figure 1. Model 1, Same Proficiency Levels 



Proficiency Levels 


1 


2 


3 


4 


GA + AA 


GA + AA 


GA + AA 


GA + AA 


Includes total % of all 
students in 
Proficiency Level 1 


Includes total % of 
all students in 
Proficiency Level 2 


Includes total % of all 
students in 
Proficiency Level 3 


Includes total % of all 
students in 
Proficiency Level 4 



Note: GA = general assessment; AA - alternate assessment 

Proficiency levels vary by state; the four in this table are just examples and could represent labels like the following: 

1 = novice, failing, unsatisfactory; 2 = partially proficient, needs improvement; 3 = proficient, meets expectations; 4 = 
exceeds expectations, advanced. 
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The policy benefits of treating the scores as the same are viewed as outweighing the technical 
soundness concerns about combining scores from different assessments in the same report. 

• One policy benefit is that schools are encouraged to take responsibility for the learning of all 
students. 

• The unit of reporting or accountability (classroom, school, or district) does not perceive that 
“the scores of those students pull down the ratings.” In fact, the alternate assessment scores 
may actually improve the overall ratings of a classroom, school, or district when the scores 
from the alternate assessment have an equal chance to be high and are counted the same as 
a high score from the general assessments. 

Cons. There are several statements that can be made about Model 1 that are contrary to its 

support. Among them are the following reasons why an approach that places all students in the 

same proficiency levels might be negative: 

• The assessments are different but are reported together, an approach that is viewed by some 
as “statistically unsound.” 

• This model may be inappropriate when the state has assessments with high stakes for students 
(e.g., diploma). When the alternate assessment is used for high stakes for students, a different 
model (perhaps with skills assessed on the alternate assessment shown at a level comparable 
to skills assessed on the general assessment) may be needed when students must demonstrate 
proficiency related to a grade level benchmark to earn a diploma. 

Implications. Model 1 has a number of implications for its use. Among these are the following 

implications: 

• Model 1 currently is considered by states where the unit of reporting or accountability is the 
school or the district, but not for individual students. These states tend to have a stronger 
focus on program evaluation and improvement. 

• Reports of combined scores may be difficult to interpret and explain, unless scores are also 
disaggregated. Reports are sometimes accompanied by text that explains that different 
assessments are reflected in the scores; these approach may be needed for clearer interpretation 
and understanding. 



Model 2 

Model 2 (see Figure 2) is described as the “apples + oranges = fruit” model (Roeber, 2001). It 
acknowledges that the general assessment and alternate assessment are different measures and 
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Figure 2. Model 2, Different Proficiency Levels Treated as Same 
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AA Proficiency Level 3 




General Assessment 


GA* AA 


General Assessment 


Proficiency Level 4 


Includes total % of students 


Proficiency Level 4 


GA description and GA % 


in both 

GA Proficiency Level 4 and 
AA Proficiency Level 4 


AA description and AA % 



Note: GA = general assessment: AA - alternate assessment 

Proficiency levels vary by state: the four in this table are just examples and could represent labels like the following: 
1 = novice, failing, unsatisfactory; 2 = partially proficient, needs improvement ; 3 = proficient, meets expectations ; 

4 = exceeds expectations, advanced. 



does not try to mix “apples” and “oranges.” Instead, it allows that a score on the alternate 
assessment holds the same value as a score in the same proficiency level on the general assessment 
and can be reported as “fruit.” In other words, the effect of earning a “2” on either assessment 
would be the same for educators in that they would investigate how instruction might be improved 
for both students if they received a score that was below “acceptable” relative to the scoring 
system. 



Pros. There are several pro-Model 2 statements. Among them are the following reasons why an 
approach that considers the proficiency levels to be different, but counts them as the same 
might be positive: 
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• The same value operates in Model 2 as in Model 1 in that the scores of the alternate assessments 
are valued as equal to the scores of the general assessments. 

• This model encourages schools to take responsibility for the learning of all students because 
all count in the same way. 

• The unit of reporting and accountability (classroom, school, or district) does not perceive 
that “the scores of those students pull down the ratings.” Alternate assessment scores may 
actually improve the overall ratings or a classroom, school, or district. 

• The assessments are different and are reported separately as well as together; this fosters 
clarity and discourages confusion. 

Cons. There are several statements that can be made about Model 2 that are contrary to its use. 

Among them are the following reasons why an approach that places all students in different 

proficiency levels, but then merges them might be negative: 

• Some might argue that this approach is still “statistically unsound,” in that the aggregation is 
technically not appropriate. 

• When the alternate assessment is used for high stakes purposes in a high stakes for students 
environment, there may be a report that shows the achievement-level of alternately assessed 
students at a level comparable to generally assessed students. 

• The report format may be difficult for parents to interpret. 

Implications. Model 2 has a number of implications for its use. Among these are the following 

implications: 

• This approach reaps the benefits of equitable consequences while avoiding the potential 
misinterpretation that the knowledge and skills demonstrated on the alternate assessment 
are the same as those demonstrated on the general assessment. 



Model 3 

In Model 3 (see Figure 3), there can be no aggregation by proficiency level, since the number of 
proficiency levels on the alternate assessment is intentionally different from the number of 
proficiency levels on the general assessment. The total number of students in the denominators 
of the alternate assessment and the general assessment may or may not be summed to ensure 
that there is accounting for 100% of the students. 
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Figure 3. Model 3, Different Proficiency Levels 



Alternate Assessment Proficiency Levels 



Alternate Assessment Alternate Assessment 
Proficiency Level 1 Proficiency Level 2 
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Pros. There are several pro statements that can be made about Model 3. Included among them 

are the following: 

• There is a clear distinction between the assessments. Each operates as a separate entity with 
separate rating scales. 

• The proficiency levels may be named differently, thus avoiding reporting students with 
significant disabilities in categories labeled as “failing” or “unsatisfactory.” 

• Statistical soundness issues resulting from the aggregation of proficiency levels from different 
assessments are avoided. 



Cons. Statements can also be made about Model 3 that are contrary to its support. The following 
are among these: 



• If states do not sum the number of students in both denominators to create a single 
denominator, it will be easier to leave some students out of the accountability system. 
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• Scores on the alternate assessment will not be easy to use for accountability purposes, since 
they represent a very small number of students who will not fit into the reporting system 
developed for the majority. 

Implications. Model 3 has several implications for its use. Among the implications are the 
following: 

• It will be difficult to aggregate scores in the future, if that becomes necessary. 

• Reports to the public on the achievement of students taking the alternate assessment may be 
difficult since the number of students is often so small that it may fall below a state’s minimal 
number for reporting. 

Model 4 

Model 4 is shown in Figure 4. This model is based on an alternate assessment development 
process in which the general standards were expanded for the alternate assessment by being 
mapped backwards from the grade level benchmarks. This process allows for skills assessed by 
the alternate assessment to begin at a lower level than a student must have to show proficiency 
in the general assessment. Often, these lower levels on the alternate assessment correspond to 
the “failing” level of the general assessment. Still, in this model, it is possible for a student who 
is difficult to assess, such as a Dr. Stephen Hawking or a Helen Keller, to use the alternate 
assessment process to demonstrate achievement on higher level skills comparable to those in 

Figure 4. Model 4, Overlapped Proficiency Levels 
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the general assessment. If there were high stakes for students, such as earning a diploma, a 
student in this type of alternate assessment would be able to demonstrate skills to earn a diploma. 
It is possible to find alternate assessment scores reaching into levels 3 and 4 on the general 
assessment, which would then be comparable to the skills demonstrated on the general paper 
and pencil tests. 

Pros. Model 4 has several positive aspects to it. Included among the pro statements that can be 
made for Model 4 are the following: 

• The scales of the alternate assessment and the general assessment are arranged to show an 
accurate relationship between the different skills demonstrated on the different assessments 
based on how the alternate assessment was developed. 

• The alternate assessment scale allows skills to be demonstrated on the alternate assessment 
in the higher levels of the general assessment. 

• The names of the three proficiency levels on the alternate assessment can be different from 
the lowest level of the general assessment levels into which they are embedded, thus avoiding 
objectionable labels, such as “failing.” 

Cons. Statements can also be made about Model 4 that are contrary to its support. The following 
are among these: 

• Most students in the alternate assessment will be perceived as operating in the “failing” or 
lowest category. 

• If schools are the units of accountability, students in the alternate assessment may be perceived 
as lowering the ratings of the school. 

• Aggregation of scores from the alternate assessment and the general assessment will load on 
the lowest general assessment proficiency level. 

• It is challenging technically to accurately align the two scales, since students take either the 
general assessment or the alternate assessment. 

Implications. Model 4 has several implications for its use. Among the implications are the 
following: 

• When there are high stakes for students, it will be necessary to validate that scores earned on 
the alternate assessment in the diploma-granted categories are comparable to scores earned 
on the general assessment. 
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• It is important to try to have a group of students who participate in both the alternate assessment 
and the general assessment. If a group of students participated in both assessments, it would 
be possible to scale the scores of the alternate assessment and the general assessment on a 
continuous scale. 



Model 5 

Model 5 is shown in Figure 5. This model puts all of the scores from the alternate assessment 
into a proficiency level below all of the proficiency levels on the general assessment. There are 
no proficiency level differences within the alternate assessment category. All students appear in 
the denominator. 



Figure 5. Model 5, Lowest Possible Proficiency Level for Alternate Assessment 
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Pros. There are not as many obvious pro statements that can be made about the approach 
represented by Model 5. However, two statements that have repeatedly been made are the 
following: 

• All students can appear in the denominator. 

• This approach maintains the integrity of a single high standard. 

Cons. Several statements that are “cons” to this approach have been identified. They are as 
follows: 

• The alternate assessment does not add value to the assessment system. 

• A state may be required to justify that all students who took the alternate assessment are 
below proficiency level 1 of the general assessment. 

• The designation of the scores from the alternate assessment as zero may have the same 
effect as the practice of exempting students from the assessment. 
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• Assigning the lowest proficiency level scores provides no incentive for improving services 
or achievement for students in the alternate assessment because it does not recognize 
improvement in performance. 

Implications. Model 5 has several implications for its use. Among the implications are the 
following: 

• Educators may perceive the alternate assessment’s purpose solely as satisfying mandates, 
but providing no useful instructional information. 

• The value of assessing, and therefore educating, students who will not achieve a score above 
a zero may be questioned. 

• An alternative to this model is one in which all of the students who took the alternate 
assessment are lumped together into an “alternately assessed” category, which does not count 
in terms of their performance. 

Model 6 

Model 6 puts all of the scores from the alternate assessment into a category called “alternately 
assessed,” which counts the alternate assessment students as having participated, but does not 
include any performance information in the reports. All students appear in the denominator. 



Figure 6. Model 6, No Alternate Assessment Proficiency Levels 
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Pros. A few positive statements can be made about the approach represented by Model 6. 
Included among them are the following: 

• All students can appear in the denominator. 

• There is no statistical confusion, since no results are reported. 
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Cons. Several negative statements also can be made about the Model 6 approach. The following 

are among these: 

• The alternate assessment does not add value to the assessment system. 

• When no results are published, instructional information is lacking. 

• The designation of the scores from the alternate assessment as not counting in any way, 
other than as participation, may have the same effect as the practice of exempting students 
from the assessment. 

• Assigning the lowest proficiency level scores provides no incentive for improving services 
or achievement for students in the alternate assessment, because it does not recognize 
improvement in performance. 

Implications. Model 6 has several implications for its use. Among them are the following: 

• Educators may perceive the alternate assessment's purpose solely as satisfying mandates, 
but providing no useful instructional information. 

• The value of educating or assessing students whose achievement will not be reported may 
be questioned by educators. 



Conclusions - 



This is the first year, 2001, that most states will publish public reports of their alternate assessment 
results. The models included here reflect a range of approaches that have either been suggested 
or implemented by the 50 states. Other models are likely to emerge as states gauge the impact 
of the reporting formats they select. 

The reporting models that have been identified thus far bring to light a realization that alternate 
assessments are part of an assessment system. While these assessments may have been developed 
by small teams of special educators (not in all states, of course, but in many), they must now be 
situated within an assessment program that includes all students. The existence of alternate 
assessments causes states to reflect on all of the components of the total system. Conversations 
about accommodations, non-standard accommodations and alternate assessment options have 
been renewed in many states now that broadly granted exemptions for some special students 
are no longer possible. 

The variety of methods created to report the results of alternate assessments demonstrate the 
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struggle of states to incorporate these new assessments into an existing structure - one that 
previously did not have to address the achievement of students with significant needs, or in 
many cases, even their presence. There are states that clearly have all of their students in state 
reports, and states that have clearly described how all of their students with disabilities are 
doing. 

There are many ways to make visible the achievement of students with disabilities in state 
accountability systems. The interpretation of federal legislation relative to state practices will 
surely guide future practice. Thus, it is important to keep track of the various models that are 
used, to explore (as done in this paper) the potential pros and cons about each approach, as well 
as the implications of the use of each. Following this, it will be extremely important to monitor 
the impact of the different approaches over time. 




14 



20 



NCEO 



References 



Cohen, M. (2000, April 6). Letter and attachment (Summary guidance on the inclusion 
requirement for Title I final assessments). Washington, DC: Office of the Assistant Secretary 
for Elementary and Secondary Education. 

Thompson, S. J., & Thurlow, M. L. (2001). 2001 State special education outcomes: A report on 
activities at the beginning of a new decade. Minneapolis, MN: University of Minnesota, National 
Center on Educational Outcomes. 



Thurlow, M. L., & Wiener, D. (2000). Non-approved accommodations: Recommendations for 
use and reporting (Policy Directions 11). Minneapolis, MN: University of Minnesota, National 
Center on Educational Outcomes. 



The College of Education 
&. Human Development 

University of Minnesota 




NCEO is an affiliated center of the Institute on Community Integration 

22 




* < 




U.S. Department of Education 

Office of Educational Research and Improvement (OERI) 
National Library of Education (NLE) 
Educational Resources Information Center (ERIC) 




NOTICE 

REPRODU CTIONBASIS 



This document is covered by a signed “Reproduction Release 
(Blanket) form (on file within the ERIC system), encompassing all 
or classes of documents from its source organization and, therefore, 
does not require a “Specific Document” Release form. 



r— / This document is Federally-funded, or carries its own permission to 

reproduce, or is otherwise in the public domain and, therefore, may 
be reproduced by ERIC without a signed Reproduction Release form 
(either “Specific Document” or “Blanket”). 




EFF-089 (9/97) 




